Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing system descriptor #265

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

sdjukicTT
Copy link
Contributor

@sdjukicTT sdjukicTT commented Feb 19, 2025

Ticket

#201

Problem description

Default system descriptor used in TTIRToTTNNBackendPipeline can't be used for multichip, actual current system descriptor is needed.

What's changed

We store the system descriptor and then set options.systemDescPath.

Checklist

  • New/Existing tests provide coverage for changes

Copy link

github-actions bot commented Feb 19, 2025

TestsPassed ✅Skipped ⚠️Failed
TT-XLA Tests599 ran434 passed165 skipped0 failed
TestResult
No test annotations available

@codecov-commenter
Copy link

codecov-commenter commented Feb 19, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.15%. Comparing base (0a01029) to head (7a73e96).
Report is 5 commits behind head on main.

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #265      +/-   ##
==========================================
+ Coverage   78.06%   78.15%   +0.09%     
==========================================
  Files          21       21              
  Lines        1044      989      -55     
==========================================
- Hits          815      773      -42     
+ Misses        229      216      -13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -197,6 +197,7 @@ void ModuleBuilder::convertFromTTIRToTTNN(
mlir::PassManager ttir_to_ttnn_pm(mlir_module.get()->getName());

mlir::tt::ttnn::TTIRToTTNNBackendPipelineOptions options;
options.systemDescPath = "system_desc.ttsys";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extract this path into client_instance or module_builder static attribute? Also, it might be nice to have a folder instead of just storing it in root. That way, we could even delete the system_desc when ClientInstance.destroy() is called, leaving less things in memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a folder?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be very of keeping thing in root, and this adds another opportunity to be more verbose (for example, by calling the folder "system_descriptors"). However, I might be wrong. @mrakitaTT do you have an opinion on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Touched on this topic in my other comment

@sdjukicTT sdjukicTT force-pushed the sdjukic/store-system-desc branch from 90d1d7f to e1cd701 Compare February 21, 2025 19:15
@@ -40,6 +40,8 @@ class ModuleBuilder {
// code. Currently hardcoded to one, as we only support one-chip execution.
size_t getNumAddressableDevices() const { return 1; }

static constexpr std::string_view system_desc_path = "system_desc.ttsys";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason why this is not just a plain string?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this can be a plain string, there is no benefit to keep it as string_view. See more info here regarding string vs string_view (std string_view is implementing the same thing as abseil's).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we would store tt::runtime::SystemDesc object inside the ClientInstance and pass it through the Compile function to the ModuleBuilder which would then pass it to the createTTIRToTTNNBackendPipeline. Since tt-mlir currently only supports passing a path to the descriptor and is not easy to change to support passing already parsed object, as a temporary solution we need to save it on the disk as pass the path to the compiler.

I propose these changes:

  • Create a member variable tt::runtime::SystemDesc m_system_descriptor in ClientInstance which will be set in the ClientInstance::PopulateDevices() function from the system_desc variable.
  • Create a member variable std::string m_cached_system_descriptor_path which should be set inside the ClientInstance constructor to the combination of std::filesystem::temp_directory_path() directory and some file name that should be unique from other programs, for example tt_pjrt_system_descriptor plus maybe name of the device architecture, and maybe even some client id if there is some in pjrt structures.
  • In the ClientInstance::PopulateDevices() initialize m_system_descriptor with system_desc, and then store it into m_cached_system_descriptor_path with a TODO comment to remove that once the support in tt-mlir is done to pass the system descriptor object. Check if store method checks for errors, if not we should check.
  • In the ClientInstance::Compile function pass the m_cached_system_descriptor_path to the module_builder_->buildModule call.
  • In the ClientInstance::~ClientInstance() remove the cached system descriptor, as you already do.

@@ -28,6 +28,7 @@ ClientInstance::ClientInstance(std::unique_ptr<Platform> platform)

ClientInstance::~ClientInstance() {
DLOG_F(LOG_DEBUG, "ClientInstance::~ClientInstance");
std::remove(ModuleBuilder::system_desc_path.data());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is ClientInstance destructor doing this? Why even do this in the first place? I don't think I have ever seen a string being cleaned up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::remove is cleaning up a file stored in that path

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants